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ABSTRACT 


_J  Proliferation  of  techniques  and  lack  of  unifying  structure  has  hindered  both  study 
and  application  of  variance  reduction.  Our"  recently  developed  unifying  structure, 
presented  In  detail  elsewhere,  Is  a  taxonomy  that  views  each  variance  reduction 
technique  (VRT)  as  a  transformation  from  one  experiment  to  another.  The  taxonomy  Is 
exhaustive  In  that  any  VRT  can  be  expressed  as  a  composition  of  elemental 


transformations  from  six  classes. 

This  paper  Illustrates  -otrr  taxonomy  by  using  the  notation,  terminology,  and 


concepts  of  the  taxonomy  to  discuss  seven  VRTs,  which  we  assume  are  familiar  to  the 
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reader,  objective  is  to  use  the  reader's  knowledge  of  these  well-known  techniques 


to  provide  an  Introduction  and  overview  of  the  taxonomy. 
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expectations,  control  variates,  Importance  sampling,  Monte  Carlo^  poststratlfled 


sampling,  simulation,  swindles,  variance  reduction 


Accession  For  ^ 

NT IS  GRA&I 

ur 

DTIC  TAB 

Unannounced 

□ 

Justification _ 

— j 

Distribution/ 


Availability  f.r.des 
jAvail  eud/or 


/Diat 

Sped 

A-1 

» 


i 


1.  INTRODUCTION 


A  computer  simulation  model  of  a  real  or  conceptual  system  consists  of  a 
probabilistic  representation  of  those  elements  In  the  system  that  cannot  be  predicted 
with  certainty  and  deterministic  rules  that  define  the  system's  reaction  to  realizations  of 
the  uncertain  elements.  For  Instance,  In  a  model  of  a  queueing  system  the  time  between 
arrivals  Is  uncertain  while  the  queue  discipline  specifies  a  rule  for  processing  customers 
after  arrival.  Such  a  model  mimics  the  actual  functioning  of  the  system,  or  at  least  the 
essential  elements  of  It.  A  simulation  experiment  Is  performed  by  generating 
realizations  of  the  model  and  computing  estimates  of  performance  measures. 

Variance  Reduction  Techniques  (VRTs)  are  transformations.  They  transform  a 
simulation  experiment  Into  a  related  experiment  that  yields  better  estimators  of  the 
performance  measures,  where  better  means  more  precise.  This  gain  In-  precision  may  be 
at  the  expense  of  the  one-to-one  correspondence  between  the  model  and  the  system.  See 
Bratley,  Fox  and  Schrage  (1083)  and  Wilson  (1083a,  1085)  for  surveys  of  variance 
reduction. 

In  Nelson  and  Schmelser  (1084b)  we  Identify  six  classes  of  transformations  that 
exhaust  the  set  of  all  possible  variance  reduction  techniques  under  composition.  "The 
derivation  of  the  six  classes  Is  based  on  a  mathematical-statistical  definition  of 
simulation  experiments  developed  specifically  for  studying  variance  reduction  (Nelson 
and  Schmelser,  1084a).  We  are  proposing  a  taxonomy  for  variance  reduction. 

Given  any  taxonomy  of  variance  reduction,  what  should  It  do  to  be  useful? 
Certainly  It  should  unify  the  field  as  It  currently  exists  and  also  bring  new  Insight  Into 
what  could  potentially  exist.  To  be  specific,  we  think  a  taxonomy  should:  (a)  eliminate 
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confusion  regarding  the  characteristics  of,  and  relationships  among,  VRTs;  (b)  provide  a 
common  language  for  communication  In  practice,  research,  and  Instruction;  (c)  help 
practitioners  determine  appropriate  variance  reduction  strategies;  (d)  generate  new 
variance  reduction  Ideas;  and  (e)  provide  a  basis  for  automation  of  variance  reduction. 

We  return  to  these  five  criteria  In  section  5  after  Illustrating  the  taxonomy  In 
section  4.  The  examples  of  section  4  are  based  on  graphical  symbols  presented  In 
section  3  and  definitions  of  simulation  experiment  and  of  the  six  classes  of 
transformations  presented  In  section  2. 

2.  BACKGROUND 

Descriptions  are  In  terms  of  matrices,  columns  of  matrices,  and  elements  of 
matrices.  Letters,  Greek  or  Roman,  without  subscripts  denote  matrices,  letters  with 
single  subscripts  denote  columns,  and  doubly  subscripted  letters  are  scalar  elements, 
using  the  usual  row-column  convention.  For  Instance,  Xu,  Is  the  »**  element  of  column 
vector  Xk,  which  Is  the  *a  column  In  the  matrix  X.  For  our  purposes,  a  matrix  need 
not  have  elements  In  all  positions,  since  elements  of  sets  are  arranged  In  rows  and 
columns  for  conceptual  rather  than  computational  reasons. 

A  letter  with  subscripts  In  parentheses  Indicates  a  set  of  variables  with  Indices  In  a 
fixed  set.  For  example,  X(tk)  denotes  all  elements  Xit  In  X  with  subscripts  In  index  set 
(«6),  a  set  that  would  have  to  be  defined.  Thus  (.)  Is  a  mapping  from  a  single  Index  to  a 
set  of  Indices. 

Random  variables  are  denoted  by  capital  Roman  letters,  and  realizations  of  these 
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random  variables  by  lower  case  letters.  For  example,  ya  Is  a  realization  of  random 
variable  Ya.  Any  notation  that  Is  counter  to  the  above  conventions  Is  specifically 
defined  as  needed. 

For  our  purposes,  a  simulation  experiment  Is  a  description  of  a  system  of  random 
variables;  the  dimensions  of  this  system  may  also  be  random  variables.  Given  a  source 
of  randomness  (usually  Independent  U(0,l)  random  variables),  realizations  of  the  system 
can  be  generated.  Based  on  the  definitions  In  Nelson  and  Schmelser  (1084a),  the 
random  variables  are  partitioned  Into  inputs,  outputs  and  statistics.  These  sets  have 
precise  definitions,  but  can  be  described  loosely  as  follows: 

Inputs  are  random  variables  defined  by  known  (although  possibly  only 
conditionally)  probability  distributions.  Examples  are  service  and  Interarrival  times  of  a 
queueing  simulation,  or  the  demand  per  period  of  an  Inventory  system.  Another 
example  is  a  service  time  whose  distribution,  conditional  on  the  number  of  customers  In 
the  system,  Is  known.  The  countably  Infinite  matrix  of  Inputs  Is  denoted  by  X  and  has 
Joint  cumulative  distribution  function  F(z). 

Outputs  are  random  variables  defined  by  known,  deterministic  functions  of  the 
Inputs.  They  are  the  observations  of  system  performance,  such  as  the  delay  experienced 
by  customers  In  the  queueing  simulation  or  the  number  of  backorders  In  the  Inventory 
system.  The  probability  distribution  of  the  outputs  Is  not  known,  but  the  functions 
define  how  outputs  are  realized  from  the  Inputs.  The  output,  Y,  Is  the  matrix  of  all 
essential  random  variables  defined  by  functions  of  X,  In  the  sense  that  all  remaining 
random  variables  that  are  functions  of  X  can  be  derived  from  Y ,  provided  no  element  of 
Y  Is  deleted.  The  outputs  are  denoted  by  Y  =  g(X-,R.),  where  R.  Is  the  sampling  plan; 
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the  sampling  plan  defines  a  stopping  rule  for  the  simulation  experiment  In  terms  of  the 


number  of  realizations  In  columns  of  Y. 

Statistics  are  functions  that  aggregate  outputs  Into  point  estimators  of  the 
performance  measures  of  Interest.  A  sample  mean  Is  often  used.  Variance  reduction 
refers  to  reducing  the  variance  of  these  statistics.  The  statistics  are  denoted  by 
Z  =  h(Y),  and  the  performance  measures  of  Interest  by  0;  Z  and  0  are  row  vectors  of  the 
same  dimension. 

If  Z  and  0  are  scalars,  and  Z  Is  an  unbiased  estimator  of  0,  then 

Var  (Z )  =  /  jA  (t  (x  ;R. )  -  dF  (x ) 

VRTs  are  transformations  of  simulation  experiments  that  alter  the  Inputs,  outputs  and 
statistics  to  reduce  Var  ( Z ).  If  we  hold  9  and  the  sample  space  of  X  fixed,  then  variance 
reduction  must  be  accomplished  by  redefining  F,  g ,  R.,  and/or  fc..  Six  classes  of 
transformations,  defined  loosely  here,  that  exhaust  the  possibilities  are: 

—  Distribution  Replacement  (DR):  Redefine  the  scalar  marginal  distributions  of 
the  Inputs  without  altering  any  statistical  dependencies  among  the  Inputs. 

—  Dependence  Induction  (DI):  Redefine  the  statistical  dependencies  among-the 
scalar  Inputs  without  altering  any  marginal  distributions  of  the  Inputs. 

—  Equivalent  Allocation  (EA):  Redefine  the  functions  from  Inputs  to  outputs 
without  altering  the  allocation  of  sampling  effort. 

—  Sample  Allocation  (SA):  Redefine  the  allocation  of  sampling  effort  without 
altering  the  functions  that  define  the  outputs. 
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—  Equivalent  Information  (El):  Redefine  the  functions  from  outputs  to  statistics 
without  altering  the  argument  set  of  the  statistics. 

—  Auxiliary  Information  (AI):  Redefine  the  argument  set  of  the  statistics 
without  altering  the  functions  from  outputs  to  statistics. 

Transformations  In  DR  and  DI  redefine  F;  those  In  EA  and  SA  redefine  g  and  R., 
respectively;  and  those  In  El  and  AI  redefine  h .  The  mathematical  rigor  needed  to 
prove  properties  such  as  exhaustiveness  Is  presented  In  Nelson  and  Schmelser  (1084a, b); 
precise  definitions  are  needed  to  make  the  partitioning  of  Inputs,  outputs  and  statistics 
unambiguous,  and  to  make  the  classes  of  transformations  distinct.  However,  the  loose. 
Intuitive  definitions  given  here  are  more  useful  for  our  current  purpose. 


3.  SYMBOL  SET 

To  augment  the  discussion  of  the  seven  VRTs  In  section  4,  a  graphical 
representation  of  each  VRT  Is  given.  Only  three  symbols  are  needed,  as  Illustrated  In 
Figure  1.  Rectangles  enclose  a  definition  of  an  Input,  output,  or  statistic  In  the 
simulation  experiment.  Circles  enclose  a  class  of  transformations.  Trapezoids  contain 
the  prior  knowledge  used  to  make  application  of  the  transformation  possible  and 
reasonable. 


Figure  1  about  here 


By  prior  knowledge  we  mean  any  knowledge,  either  known  with  certainty  or 
suspected,  beyond  that  necessary  to  design  the  original  (“crude’*)  simulation  experiment. 
Since  prior  knowledge  Is  often  difficult  to  state  succinctly,  and  since  our  purpose  here  Is 
to  Illustrate  the  taxonomy  rather  than  to  provide  every  detail  of  the  VRTs  discussed, 
the  prior  knowledge  specified  In  Figures  2-8  Is  often  minimal.  Complete  specification  Is 
often  clear  by  considering  Implementation  of  the  VRT. 

The  progression  In  each  figure  Is  from  left  to  right,  proceeding  from  a  definition  of 
some  Input,  output,  or  statistic  to  a  new  definition  via  a  transformation. 


4.  DECOMPOSITION  OF  VARIANCE  REDUCTION  TECHNIQUES 

The  VRTs  considered  are  antithetic  variates,  common  random  numbers,  control 
variates,  stratified  sampling,  poststratlfylng  the  sample,  conditional  expectations,  and 
Importance  sampling.  These  seven  were  selected  specifically  because  they  are  well- 
known  and  understood,  and  thus  they  provide  a  convenient  Introduction  to  our  variance 
reduction  taxonomy.  For  each  VRT,  a  description  of  the  VRT  and  graphical  display  of 
Its  decomposition  Is  presented.  The  purpose  Is  not  to  propose  a  comprehensive 
definition  of  these  seven  VRTs,  but  rather  to  Illustrate  our  taxonomy  using  well-known 
examples. 

Each  transformation  In  the  decomposition  of  a  VRT  may  not,  by  Itself,  reduce 
variance  or  even  yield  an  acceptable  experiment.  Variance  reduction  Is  achieved  by  the 
combined  effect  of  all  the  relevant  transformations. 
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4.1  Antithetic  Variates  (AV) 


Antithetic  Variates  Is  a  VRT  that  has  been  extensively  studied  In  the  context  of 
Monte  Carlo  estimation.  The  technique  first  appeared  In  Hammersley  and  Morton 
(1056),  with  further  early  developments  by  Hammersley  and  Mauldon  (1056),  Morton 
(1057),  Halton  and  Handscomb  (1057),  and  Handscomb  (1058).  In  Its  broadest  sense, 
"we  use  the  term  antithetic  variates  to  describe  any  set  of  estimators  which  mutually 
compensate  each  other’s  variations"  (Hammersley  and  Handscomb,  1064,  p.  61).  Statist¬ 
ical  results  such  as 

Var  (Y,  ±Yj  )  =  Var  ( V, )  +  Var  (Y,  )  ±  2Cov  (V, ,  Y, )  (l) 

Indicate  the  advantage  of  forcing  correlation  among  outputs  while  maintaining  their 

marginal  distributions.  Antithetic  variates  attempts  to  Induce  negative  correlations 

among  Identically  distributed  simulation  outputs. 

Consider  estimating  0„  a  real  scalar,  using  a  simulation  experiment  defining 

Vi-1  =  y.i(AT(,i,)  1=1,2 . /,=2n 

where  E(Yiy)  =  0,  and  X{il)  Is  a  set  of  Inputs  Indexed  by  i,  with  statistic 

Z,  =  /,"*  E  Yit 

i  —I 

Further,  suppose  that 

-V(,i)  ~  i.i.d.  F(„(x(tI))  i'=l,2 . 2n  (2) 

The  usual  AV  transformation  Is  to  redefine  the  Joint  distribution  of 

(-^(ai-i.n  >  -^(3*,i))  *  =1,2 . n 

such  that  the  marginals  given  In  (2)  are  unchanged,  but  the  pairs  are  negatively  corre¬ 
lated.  When  A'(ll)  Is  a  scalar,  or  If  AV  Is  used  only  on  a  scalar  component  of  the 
correlation  Is  most  often  Induced  by  generating  realizations  via  the  Inverse  cumulative 
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distribution  function  (cdf)  of  X(il)  In  the  following  manner: 


-^(af-i.i)  —  F  \i)(^i') 

*<w.,)  =  F-'ud-Ut) 

where  U(  ~  l.l.d.  U(0,1)  i  —  i,  2. ....  n ;  this  Induces  the  minimal  achievable  covariance 
between  the  Inputs  X(ai _ltl)  and  AT(a(il)  with  the  given  marginal  distributions  (Whitt, 
1970).  The  resulting  Joint  cdf  Is 

max  {F(1)(z(a, +  F(l)(x(afiI))  -  1,  0} 

Thus,  AV  Is  composed  of  a  single  transformation  from  the  Dependence  Induction  (DI) 
class,  as  shown  In  Figure  2. 


Figure  2  about  here 


When  correlation  Is  to  be  Induced  among  k-tuples  (k  >  2),  there  are  a  variety  of 
approaches  and  objectives  (see  for  Instance,  Fishman  and  Huang,  1983).  Although  more 
complicated  mathematically,  the  k-tuple  case  still  Involves  only  a  single  transformation 
from  DI. 

The  reason  for  Inducing  dependence  among  the  Inputs  Is  to  cause 

cov  (ya,-_M ,  yai.i)  <  0 

which  reduces  the  variance  of  Z ,  via  (1).  Negative  covariance  between  the  outputs  Is 
not  guaranteed  by  negative  covariance  between  the  Inputs.  However,  Wilson  (1983) 
showed  that  If  the  Inputs  are  generated  via  the  Inverse  cdf  (as  shown  above)  and  Is 
monotone,  then  the  minimum  achievable  covariance  between  ya,-M  and  y9liI  will  be 
achieved.  However,  achieving  the  bound,  or  even  reducing  variance  at  all,  Is  not  a  fac- 
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tor  In  classifying  these  methods  as  AV  or  the  transformation  as  AI.  The  only  relevant 


factor  Is  that  the  correlation  Is  induced  by  redefining  the  Inputs  while  maintaining  tfceh 
marginal  distributions. 

This  redefinition  Is  usually  accomplished  by  making  the  Inputs  deterministically 
dependent.  When  an  estimator  consists  of  a  sum  of  n  outputs,  the  antithetic-variates 
theorem  (Hammersley  and  Mauldon,  1950,  Handscomb,  1058,  Wilson,  1079,  1983)  states 
that  under  fairly  general  conditions  the  greatest  lower  bound  of  the  variance  of  the  esti¬ 
mator  can  be  approached  arbitrarily  closely  by  generating  all  n  Inputs  from  a  deter¬ 
ministic  transformation  of  a  single  randomly  sampled  Input.  But  whether  or  not  corre¬ 
lation  Is  induced  deterministically,  the  transformation  Is  still  DI  In  our  taxonomy. 


4.2  Common  Random  Numbers  (CRN) 

Common  random  numbers  Is  often  called  correlated  sampling  (CS).  Confusion  can 
arise  because  CRN  Is  both  a  method  for  generating  correlated  samples  and  a  VRT  that 
exploits  Induced  correlation.  “The  name  of  the  technique  stems  from  the  possibility  In 
some  situations  of  using  the  same  stream  of  basic  U(0,1)  random  variables  to  drive  each 
of  the  alternative  models  through  time...”  (Law  and  Kelton,  1982,  p.  350).  We  use  the 
term  CRN  In  the  sense  of  CS,  meaning  that  correlation  Is  Induced  (by  whatever  means) 
between  certain  Inputs  to  obtain  positively  correlated  outputs  for  estimating  the 
difference  between  two  performance  measures.  CRN  has  the  distinction  of  being  "...the 
only  VRT  that  Is  as  a  rule  used  by  practitioners  of  simulation"  (Kleljnen,  1974,  p.  200). 

Consider  estimating 
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transformed  In  the  simulation  experiment.  The  decomposition  of  a  VRT  depends,  for 


Instance,  on  what  the  crude  experiment  was.  Also,  two  techniques  that  emphasize  sam¬ 
pling  based  on  importance  may  differ  greatly  In  how  they  achieve  It  (see  STRAT  and  IS, 
above).  In  this  paper,  the  decomposition  of  each  VRT  should  be  considered  In  light  of 
the  definition  of  the  VRT  presented  here;  It  may  not  (and  probably  will  not)  be  the 
same  for  every  application  that  goes  by  the  same  name. 

Tables  I  and  IT  summarize  the  decompositions  presented  In  this  paper.  Any 
decomposed  VRT  lies  In  a  cell  of  one  of  these  or  a  higher  order  table.  Six  of  the  seven 
VRTs  discussed  are  shown.  The  seventh,  Importance  sampling,  lies  In  the  fourth  order 
table  in  cell  (DR,  EA,  El,  AI). 

We  conjecture  that  If  all  known  VRTs  were  decomposed  and  entered  Into  these 
and  higher  order  tables,  there  would  be  empty  cells.  Why  are  these  cells  empty?  Do 
they  suggest  new  VRTs  to  be  discovered,  or  are  there  some  combinations  that  are 
Infeasible?  Our  taxonomy  suggests  the  openings  and  provides  a  foundation  on  which  to 
develop  further  structure  for  addressing  such  Issues. 


Tables  I  and  II  about  here 
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for  general  simulation  experiments.  The  underlying  motivation  for  the  creation  of  our 


taxonomy  was  the  observation  that  variance  reduction,  despite  Its  obvious  benefit,  Is  sel¬ 
dom  used  because  of  the  difficulty  In  learning  variance  reduction  Ideas,  the  difficulty  In 
Implementing  many  of  the  techniques,  and  the  likelihood  of  Improperly  Implementing  a 
technique,  thereby  Invalidating  the  experiment.  Any  hope  of  widespread  use  of  variance 
reduction  depends  upon  the  automation  of  these  methods.  Automation  needs  to  occur 
for  both  the  Identification  of  appropriate  variance  reduction  Ideas  and  for  the  Implemen¬ 
tation.  The  long  term  goal  Is  an  Interactive  system  capable  of  looking  at  a  computer 
simulation  model  and  asking  good  questions  about  the  available  prior  knowledge  (recall 
the  definition  of  prior  knowledge)  and  suggesting  good  variance  reduction  Ideas.  In  con¬ 
junction  with  a  particular  language  such  a  system  would  then  Implement  the  Ideas.  But 
any  automated  system  needs  a  “world"  In  which  to  work.  We  hope  that.  Just  as  our 
taxonomy  has  provided  a  world  view  for  us  to  Identify  appropriate  VRTs  directly,  the 
taxonomy  will  do  the  same  for  automated  systems. 


In  discussing  these  five  criteria,  we  are  making  an  Important  distinction  between 
the  design  and  the  analysis  of  VRTs.  Criteria  (c),  (d),  and  (e)  above  concern  design, 
while  the  present  paper  Illustrates  the  analysis  aspect  (criteria  (a)  and  (b))  by  demon¬ 
strating  how  VRTs  are  decomposed. 

We  are  often  asked:  "In  terms  of  this  taxonomy,  what  Is  VRT  x?”  Our  answer  Is 
usually  to  give  the  decomposition  for  what  we  consider  to  be  the  most  common  form  of 
VRT  x,  as  was  done  In  this  paper.  A  better  answer  would  be  “It  depends."  A  strength 
of  our  taxonomy  Is  that  It  does  not  categorize  VRTs,  but  reflects  what  Is  actually 
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experience  has  been  that  once  a  simulation  experiment  Is  expressed  In  terms  of  the 
Inputs,  outputs  and  statistics,  then  examining  the  experiment  from  the  perspective  of 
the  six  classes  of  transformations  facilitates  the  Identification  of  appropriate  variance 
reduction  strategies.  In  a  variety  of  real  and  textbook  cases,  we  have  found  that  think¬ 
ing  In  terms  of  the  taxonomy  provided  a  ready-made  list  of  ideas  (the  six  classes), 
helped  Identify  the  relevant  prior  knowledge,  suggested  generalizations  and 
modifications  of  standard  techniques,  and  Indirectly  provided  a  sense  of  when  to  stop 
the  search  for  more  variance  reduction  Ideas.  Of  course,  It  Is  difficult  to  determine 
whether  the  Improved  ability  to  find  variance  reduction  Ideas  Is  due  to  the  Insight 
gained  from  the  months  spent  creating  the  taxonomy  or  from  the  taxonomy  Itself.  Only 
the  experience  of  others  will  tell. 

(d)  The  taxonomy  generates  new  variance  reduction  Ideas.  While  our  taxonomy 
Is  not  an  “erector  set"  of  components  from  which  VRTs  are  directly  assembled.  It  does 
foster  discovery  of  new  Ideas  by  expanding  rigid  definitions  of  standard  VRTs  (see  CV 
above,  for  example).  In  terms  of  the  six  classes  of  transformations,  no  VRT  Is  really 
new,  it  Is  Just  an  expansion  of  existing  Ideas  made  possible  by  a  broader  perspective. 
Examples  of  such  Ideas  that  the  authors  have  developed  (and  have  not,  to  our 
knowledge,  appeared  In  the  literature)  Include:  (1)  using  nonlinear  control  variates  for 
estimating  probabilities,  where  standard  linear  controls  may  yield  infeasible  estimates, 
(11)  using  the  difference  between  two  analytic  results  as  control  variates,  and  (111)  using 
distribution  replacement  to  make  optimal  sample  allocation  possible  (a  difficult  concept 
before  the  two,  often  confused.  Ideas  were  distinctly  separated). 

(e)  The  taxonomy  provides  a  basis  for  research  on  automated  variance  reduction 
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Introduction  In  light  of  the  taxonomy  presented  In  this  paper: 


(a)  The  taxonomy  reduces  the  confusion  that  currently  exists  regarding  the  funda¬ 
mental  characteristics  of,  and  relationships  among,  VRTs.  Nelson  (1983)  listed  eighty 
names  of  VRTs  that  have  appeared  In  the  literature.  Many  are  different  names  for 
almost  (If  not  precisely)  the  same  technique  and  many  fundamentally  different  tech¬ 
niques  have  quite  similar  (or  Identical)  names.  These  similarities  and  differences  are 
more  apparent  when  VRTs  are  decomposed  Into  their  basic  transformations,  as  the 
examples  In  this  paper  Illustrate. 

(b)  The  taxonomy  provides  a  common  language  for  communication  among 
researchers,  between  researchers  and  practitioners,  and  among  practitioners.  The  prob¬ 
lem  of  multiple  names  for  roughly  equivalent  (where  we  can  now  say  what  roughly 
equivalent  means)  VRTs  should  be  less  of  an  issue.  The  similarities  and  differences 
between  VRTs  are  apparent  when  they  are  decomposed  Into  their  elemental  transforma¬ 
tions.  The  existence  of  classes  of  transformations  makes  possible  the  unification  of 
theoretical  results  concerning  conditions  that  Insure  effectiveness  of  VRTs,  results  that 
presently  exist  In  a  fragmented  fashion  In  the  literature  and  have  been  of  limited  practi¬ 
cal  use. 

Ideally  the  taxonomy  would  also  be  a  useful  tool  for  teaching  variance  reduction  to 
simulators  (students  and  practitioners),  but  our  experience  In  this  area  has  been  disap¬ 
pointing.  The  reason,  and  a  motivation  for  this  paper.  Is  that  learning  proceeds  most 
naturally  from  specific  to  abstract. 

(c)  The  taxonomy  provides  a  different,  and  we  think  more  effective,  approach  for 
practitioners  to  find  variance  reduction  strategies  In  simulation  experiments.  Our 


original  outputs  Yix  and  the  auxiliary  Information  Yia  (Al).  Generalization  to  altering 


the  distribution  of  a  random  vector,  rather  than  the  scalar  Xm,  does  not  change  the 
decomposition. 


.  Figure  8  about  here 


The  decomposition  also  does  not  depend  upon  the  choice  of  the  new  Input  distri¬ 
bution  /  &  (za ),  which  typically  Is  chosen  to  be  approximately  proportional  to 
|  E(Yix  |  x*)  |  /alia,),  where  the  expectation  Is  over  X(il).  This  measure  of  Importance  Is 
roughly  the  product  of  output  magnitude  |  y,,  |  and  Input  likelihood.  See  Kahn  (1056) 
for  additional  details. 

5.  CONCLUSIONS 

The  .examples  In  this  paper  Illustrate  the  variance  reduction  taxonomy  developed 
In  Nelson  and  Schmelser  (1984a, b).  The  analysis  of  variance  reduction  techniques  by 
decomposing  them  Into  members  of  the  six  classes  of  transformations  Is  central.  Other 
attempts  to  develop  a  taxonomy  of  variance  reduction,  such  as  McGrath  and  Irving 
(1073)  and  Wilson  (1983a,  1985),  were  not  completely  successful,  In  part  because  they 
tried  to  categorize  VRTs.  But  as  we  have  seen  In  section  4,  simple  partitions  do  not 
exist.  VRTs  overlap  In  the  sense  that  they  are  composed  of  transformations  from  com¬ 
mon  classes. 

Returning  to  the  question  raised  In  the  Introduction  —  Why  Is  a  taxonomy  of 
variance  reduction  useful?  —  we  now  discuss  the  five  criteria  mentioned  In  the 


region.  In  contrast  to  STRAT,  which  directly  allocates  sampling  effort,  IS  biases  the 


outputs  by  altering  the  probability  distributions  of  the  Inputs. 

IS  Is  a  standard  technique  In  Monte  Carlo  estimation  problems;  see  for  Instance 
Hammersley  and  Handscomb  (1064)  and  Kahn  (1056).  The  technique  Is  used  Infre¬ 
quently  In  systems  simulation  because  the  effect  of  altering  the  Input  distribution  Is 
often  difficult  to  derive.  See  Kleljnen  (1074)  for  a  general  discussion,  and  Jeruchlm 
(1084)  for  an  example. 

A  simple  version  of  IS  that  Illustrates  the  central  Idea  Is  given  here.  Consider 
estimating  0»  by  /,  observations  of  y,„  where  E(Yit)  —  0„  »  =  l,  2 /,.  The  crude  esti¬ 
mator  of  0,  might  be 

Zx  =  *,(>',„)  e  n, 

»— i 

Now  suppose  Yu  Is  a  function  of  (X(l  l).  X»)  for  some  fixed  column  Index  k  of  X.  For 
notatlonal  convenience  write 


Yi  i  —  fi  i  {Xu, ) 

suppressing  the  X,(-  0.  Assume  that  X*  are  l.l.d.  random  variables  for  all  i  with  Identical 
discrete  or  continuous  marginal  pdf  /<*(*,*)•  Consider  some  other  pdf  /A  of  the  same 
type  and  having  the  same  support  as  /#.  If  X<*  Is  sampled  from  /  A  and  If  X(<1)  Is  sam¬ 
pled  from  the  unaltered  conditional  distribution  of  X(f„  given  X*  =  **,  then 


fa  (Xa ) 


'X 

C 

i-l 


=  /f‘  E  YixYi 


•’9 


"-'■"fi*1- TawTT 

Is  the  (unbiased)  IS  estimator  of  0,.  As  shown  In  Figure  8,  IS  employs  the  new  input  dis¬ 
tribution  FA  (DR),  the  new  outputs  y  9  based  on  the  prior  knowledge  of  the  old  and 


new  Input  distributions  (EA),  and  the  new  statistic  (El)  that  averages  the  product  of  the 
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Is  known  or  can  be  calculated  for  all  realizations  *,a  of  7a,  where  here  7,  Is  generic  for 
any  of  Yiit  then  based  on  the  well-known  relation 

Var  (£(7,|  yia))  =  Var  (7,)  -  E  (Var  (7,  |  7a)) 
the  conditional  expectation  estimator 

la 

Z{  =  h ,'( 7 a)  =  7a~‘  E  F(7,  |  7a)  (8) 

•  —  1 

can  be  used.  As  shown  In  Figure  7,  based  on  the  (possibly  only  suspected)  prior 
knowledge  that  E(Var(7,  |  ya))>o,  CE  uses  7,a  as  auxiliary  Information  (AI)  In  the 
modified  estimator  (El)  based  on  the  constants  E(7,  |7a)  obtained  from  prior 
knowledge. 

The  estimator  (8)  Is  unbiased  for  0t,  and  If  7,  ==  /a  and  the  ya  are  Independent 
then  It  has  no  greater  variance  than  (7).  However,  CE  Is  often  employed  when  7a  >  7„ 
such  as  when  7,  are  results  of  “rare  events."  Clearly  the  estimator  (8)  may  be  based  on 
a  vector  of  outputs,  not  Just  a  scalar  7,a,  but  this  does  not  affect  Its  decomposition. 
Note  that  7,  has  not  been  redefined,  but  rather  other  outputs  (auxiliary  Information) 
In  the  simulation  experiment  are  used. 


Figure  7  about  here 

4.7  Importance  Sampling  (IS) 

Importance  sampling  Is  one  of  several  VRTs  that  attempt  to  concentrate  sampling 
In  regions  of  Interest,  where  interest  may  be  related  to  the  variance  within  the  region, 
the  likelihood  of  observations  In  the  region,  and/or  the  magnitude  of  observations  In  the 
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Figure  6  about  here 


4.6  Conditional  Expectations  (CE) 

The  conditional  expectations  method  Is  often  called  conditional  Monte  Carlo 
(CMC).  However,  CMC  also  refers  to  a  sampling  technique  developed  by  Trotter  and 
Tukey  (1050)  to  “use  a  family  of  transformations  to  convert  given  samples  into  samples 
conditioned  on  a  given  characteristic  (p.  64).“  Dubl  and  Horowitz  (1070),  Granovsky 
(1081),  and  Wilson  (1085)  discuss  CMC  In  detail.  But  other  than  to  mention  that  the 
original  CMC  employs  a  transformation  In  EA,  we  do  not  discuss  CMC  further  here, 
since  the  background  and  detail  needed  to  decompose  CMC  requires  the  precise 
definitions  of  our  taxonomy. 

We  reserve  the  term  conditional  Monte  Carlo  for  the  original  sampling  technique. 
Conditional  expectations  (CE)  Is  used  here  as  In  Law  and  Kelton  (1082),  where  the 
expected  value  of  the  output  of  Interest  Is  replaced  by  Its  known  conditional  expected 
value. 

Consider  estimating  0,  using  a  simulation  experiment  defining 

Yji  “  9i  •'  ”=  !•  1 1 

where  E(Yix)  =  #„  with  statistic 

=  A.fT,)- /f*  St  Ytl  (7) 

If  there  Is  another  output  random  variable  Yi9  i  «■»  l,  2 . /,  such  that 

E(Y>  I  Yi9  -  *,) 
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a  more  representative  sample  by  controlling  the  sampling  plan  {/„  /, . When 

such  control  Is  not  possible,  sometimes  each  observation  can  be  weighted  on  the  basis  of 
whether  Its  stratum  Is  under  or  overrepresented  In  the  sample.  This  VRT,  usually 
called  poststratificd  sampling,  we  call  poststratifying  the  samplt,  since  the  sampling  plan 
Is  not  altered.  Bratley,  Fox  and  Schrage  (1983),  Cochran  (1977),  and  Kleljnen  (1974) 
discuss  PSTRAT.  Wilson  and  Prltsker  (1984a, b)  apply  PSTRAT  In  queueing  simula¬ 
tion. 

Using  the  above  notation  for  STRAT,  consider  the  PSTRAT  estimator 


/—a  m  li 


where  Ymj  Is  the  m‘*  observation  of  Yx  for  which  Xu,  €  Ly,  and  /,  Is  the  number  of  such 
observations  of  the  /,  total.  The  7)  are  outputs  (the  result  of  sampling),  but  we  con¬ 
tinue  to  denote  them  as  /,  for  convenience. 

Provided  7,  >  o  for  /  =  2,  3 . n+i,  Zfls  an  unbiased  estimator  of  0,.  Whereas  Z , 

gives  each  observation  weight  1  //„  Z,*  gives  weight  py//y.  If  the  observations  distribute 
themselves  proportionately  (7y  =  py/,)  then  this  reduces  to  1  //,.  If  a  stratum  Is  over  or 
underrepresented,  p,  //,  Is  less  or  greater  then  1  //„  respectively.  Thus  PSTRAT 
corrects  for  disproportionate  sampling,  In  contrast  to  CV,  which  corrects  for  shifts  In 
location.  Figure  6  shows  how  PSTRAT  combines  the  strata  sample  sizes  as  auxiliary 
Information  (AI)  with  the  prior  knowledge  of  the  strata  probabilities  to  obtain  the  new 
estimator  (El).  In  our  taxonomy  PSTRAT  Is  more  closely  related  to  CV  than  to 
STRAT,  since  Jy  In  PSTRAT  Is  random  and  therefore  an  output  requiring  AI,  while  In 
STRAT  I)  Is  a  known  constant  and  therefore  simply  part  of  the  new  estimator. 
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The  STRAT  estimator  Is 


■  4-1 

.=  If  -  /, 
)  -a 


zi  -  *  *s‘Pi  (/;-*  ^  y„) 

which  requires  arbitrary  control,  of  the  new  /**  stratum  sample  size  1}  and  prior 
knowledge  of  P  (.!&  €  Lt ),  denoted  by  pi ,  for  each  stratum  j  =  a.  s . «  +1 . 

Figure  5  shows  that  STRAT  reallocates  the  number  of  observations  per  stratum 
(SA)  using  the  prior  knowledge  of  the  strata  definitions  and  then  rewelghts  the  outputs 
In  the  estimator  by  the  ratio  Pi  //'  (El).  Generalization  to  stratifying  on  a  random  vec¬ 
tor  rather  than  a  scalar  random  variable  does  not  affect  the  decomposition.  Also, 
whether  or  not  the  new  sample  allocation  yields  a  variance  reduction  does  not  affect  the 
decomposition.  Allocation  strategies  are  not  discussed  here  (see  for  instance,  Cochran, 
1077),  but  proportional  allocation  (//  =  7,P,)  guarantees 

Var  (Z{)  <  Var  (Zt) 

(Rubinstein,  1981).  If  the  /,•  are  not  altered  by  fixing  them  In  advance,  then  the  VRT  Is 
known  as  poststratlfying  the  sample;  see  section  4.5  below. 


Figure  5  about  here 


4.5  Poststratlfying  the  Sample  (P  STRAT) 

A  source  of  variability  In  all  sampling  experiments  Is  that  the  sample  Is  not 
representative  of  the  population  sampled.  Using  proportional  allocation,  STRAT  forces 
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between  the  two  systems;  as  in  the  CRN  example  (section  4.2),  both  systems  are  part  of 
the  same  simulation  experiment  In  our  definition. 

4.4  Stratified  Sampling  (STRAT) 

STRAT  Is  a  technique  that  replaces  simple  random  sampling  with  a  sampling  plan 
designed  to  reduce  variance.  Hammersley  and  Handscomb  (1064)  and  Rubinstein  (1081) 
discuss  stratified  sampling  In  the  context  of  Monte  Carlo  problems  and  Cochran  (1077) 
discusses  the  context  of  survey  sampling.  Books  containing  chapters  dealing  with 
stratified  sampling  specifically  In  systems  simulation  are  Kleljnen  (1074)  and  Bratley, 
Fox  and  Schrage  (1083). 

Consider  estimating  9X  when  It  Is  possible  to  sample  /,  observations  of  Yilt  where 
E(Y{ ,)  —  •„  »  =  l,  2 . /,.  The  crude  estimator  of  p,  might  be 

Now  suppose  Y(l  can  be  expressed  as  a  function  of  (X(<1),  X# )  for  some  fixed  column 
Index  k  of  X.  For  notatlonal  convenience  write 

*.  =  *.(*»)  '(6) 
suppressing  the  XVl).  Assume  that  X*  are  l.l.d.  random  variables  for  all  »,  and  that  the 

range  of  X*  can  be  divided  Into  n  nonoverlapping,  exhaustive  strata  (Intervals).  Denote 

these  strata  by  Ly  ,  j  =2, 3, ....  n  +i.  An  equivalent  way  to  view  (a)  Is 

Ymj  =  9njiXk  )  j  =  2,  3 . »  +1  m  =  1,  2 . /y 

such  that  Ym ,  Is  the  m“  observation  of  Yx  for  which  the  associated  random  variable 
Xu  e  Lj ,  and 
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where  t  Isa  scalar,  and  the  ratio  estimator 


Zt 

The  function  «s  Is  the  control  variate. 


Both  (3)  and  (4)  are  of  the  form 


(4) 


Zt  =  JM'.OW.'aOw))  (5) 

with  the  property  that  A  (0„a)  =  0,.  Several  authors  have  noted  that  these  two  estima¬ 
tors  are  similar.  Including  Kleljnen  (1974)  and  Isakl  (1983). 


As  shown  In  Figure  4,  statistics  such  as  (5)  are  obtained  by  a  composite  transfor¬ 
mation  that  first  augments  the  argument  with  output  Ym,  which  Is  AI,  then  modifies 
the  statistic  A  ,  which  Is  EL 


Figure  4  about  here 


Both  (3)  and  (4)  extend  naturally  to  multiple  control  variates,  which  does  not 
change  the  decomposition.  Whether  A  is  a  constant  or  Is  estimated  from  the  outputs 
also  does  not  change  the  decomposition. 

In  the  simulation  literature,  a  distinction  Is  made  between  "Internal"  control  vari¬ 
ates  (random  variables  that  are  part  of  the  same  real  or  conceptual  system)  and  "exter¬ 
nal"  control  variates  (random  variables  that  are  part  of  a  similar  real  or  conceptual  sys¬ 
tem).  This  distinction  Is  Important  In  our  taxonomy.  Internal  CV,  shown  In  Figure  4, 
makes  use  of  Inherent  correlation  within  the  single  system.  However,  external  control 
variates  employ  an  additional  DI  transformation  to  Induce  statistical  dependence 
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This  simple  version  of  CRN  Is  shown  In  Figure  3. 

Figure  3  about  here 

The  desired  positive  correlation  between  the  outputs  Is  not  guaranteed  merely  by 
Inducing  positive  correlation  between  the  Inputs.  However,  analogous  to  antithetic  vari¬ 
ates,  If  the  Inverse  cdf  Is  used  to  generate  the  Inputs,  then  monotonlclty  of  the  ga  func¬ 
tions  ensures  a  favorable  covariance  term.  Here  again,  whether  a  variance  reduction  Is 
achieved  Is  not  relevant  to  the  decomposition  of  CRN.  Similarly,  the  decomposition  Is 
the  same  when  the  Inputs  are  an  historical  trace  or  when  nondetermlnlstlc  methods 
(such  as  blocking  In  the  experimental  design)  are  used. 

4.3  Control  Variates  (CV) 

By  the  term  control  variates  we  refer  to  statistics  that  attempt  to  correct  the 
value  of  an  estimator  based  on  the  discrepancy  between  the  value  of  a  second  estimator 
and  the  known  value  of  Its  expectation.  For  example,  let  y(l)  and  Ym  be  sets  of  output 
random  variables  In  a  simulation  experiment,  and  «,  and  i,  be  known  scalar-valued 
functions  such  that 

where  0,  and  a  are  real  scalars;  0,  Is  the  performance  measure  of  Interest  and  a  Is  known. 
The  two  most  common  CV  estimators  of  0,  are  the  linear  control 

Zt  =tl(Y{l))-KtjYw)-a)  (3) 
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where  aa  and  a,  are  real,  scalar  constants,  using  a  simulation  experiment  defining 

Ya  “  9a  (-X\a>)  *  “*  l,  2 . 7|  1  =  2,  3 

where  E(Ya)  —  a,,  with  statistic 

/j  ig 

% i  =  79_1-  £  r<9-/.-‘ E  ?3-  F. 

<-l  i — 1 

The  basis  for  CRN  Is  the  well-known  relation 

Var  (Fa  -  ?8)  =  Var  (Fa)  +  Var  (F.)  -  2  Cov  (Fa,  ?,) 

Aggregating  the  Individual  Inputs  Xa  Into  two  sets  of  Inputs  corresponding  to  the 
two  systems,  we  can  write 

Y,=t,(X(l))  /  =  2,  3 

which  defines  two  aggregated  sets  of  outputs.  The  original  experiment  typically  has  Xw 
and  AT(S)  Independent;  that  Is,  the  two  systems  are  realized  using  different  sequences  or 
U(0,1)  random  numbers.  CRN  redefines  the  Joint  distribution  of  (AT(a).  AT,,,),  without 
changing  their  multivariate  marginal  distributions.  In  a  way  the  practitioner  hopes  will 
Induce  Cov  (Fa,  F#)  >  o  and  In  turn  a  variance  reduction  for  Zx.  Thus,  CRN  consists  of  a 
single  transformation  from  the  DI  class. 

If  pairs  of  scalar  Inputs,  say  (Xia,  Xt),  can  be  Identified  such  that  each  pair  Is 
Independent  of  all  other  pairs,  Xi9  €  Xw,  and  X»  €  X{t),  then  positive  correlation  can  be 
Induced  within  each  pair  by  generating  observations  with  the  same  U(0,1)  random 
number  sequence  using  the  Inverse  cdf 

Xa=Fa-\U{)  1=2.3 

which  results  In  the  maximum  achievable  Cov  (A;a,  Xit)  and  Joint  cdf 

min  {F,a(x,a),  F,g(*,»)} 
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